Goto

Collaborating Authors

 Bridgewater




Data-Driven Estimation of the False Positive Rate of the Bayes Binary Classifier via Soft Labels

Jeong, Minoh, Cardone, Martina, Dytso, Alex

arXiv.org Artificial Intelligence

Classification is a fundamental task in many applications on which data-driven methods have shown outstanding performances. However, it is challenging to determine whether such methods have achieved the optimal performance. This is mainly because the best achievable performance is typically unknown and hence, effectively estimating it is of prime importance. In this paper, we consider binary classification problems and we propose an estimator for the false positive rate (FPR) of the Bayes classifier, that is, the optimal classifier with respect to accuracy, from a given dataset. Our method utilizes soft labels, or real-valued labels, which are gaining significant traction thanks to their properties. We thoroughly examine various theoretical properties of our estimator, including its consistency, unbiasedness, rate of convergence, and variance. To enhance the versatility of our estimator beyond soft labels, we also consider noisy labels, which encompass binary labels. For noisy labels, we develop effective FPR estimators by leveraging a denoising technique and the Nadaraya-Watson estimator. Due to the symmetry of the problem, our results can be readily applied to estimate the false negative rate of the Bayes classifier.


$L^1$ Estimation: On the Optimality of Linear Estimators

Barnes, Leighton P., Dytso, Alex, Liu, Jingbo, Poor, H. Vincent

arXiv.org Machine Learning

Consider the problem of estimating a random variable $X$ from noisy observations $Y = X+ Z$, where $Z$ is standard normal, under the $L^1$ fidelity criterion. It is well known that the optimal Bayesian estimator in this setting is the conditional median. This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian. Along the way, several other results are presented. In particular, it is demonstrated that if the conditional distribution $P_{X|Y=y}$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution. Additionally, we consider other $L^p$ losses and observe the following phenomenon: for $p \in [1,2]$, Gaussian is the only prior distribution that induces a linear optimal Bayesian estimator, and for $p \in (2,\infty)$, infinitely many prior distributions on $X$ can induce linearity. Finally, extensions are provided to encompass noise models leading to conditional distributions from certain exponential families.


CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting

Graham, Simon, Vu, Quoc Dang, Jahanifar, Mostafa, Weigert, Martin, Schmidt, Uwe, Zhang, Wenhua, Zhang, Jun, Yang, Sen, Xiang, Jinxi, Wang, Xiyue, Rumberger, Josef Lorenz, Baumann, Elias, Hirsch, Peter, Liu, Lihao, Hong, Chenyang, Aviles-Rivero, Angelica I., Jain, Ayushi, Ahn, Heeyoung, Hong, Yiyu, Azzuni, Hussam, Xu, Min, Yaqub, Mohammad, Blache, Marie-Claire, Piégu, Benoît, Vernay, Bertrand, Scherr, Tim, Böhland, Moritz, Löffler, Katharina, Li, Jiachen, Ying, Weiqin, Wang, Chixin, Kainmueller, Dagmar, Schönlieb, Carola-Bibiane, Liu, Shuolin, Talsania, Dhairya, Meda, Yughender, Mishra, Prakash, Ridzuan, Muhammad, Neumann, Oliver, Schilling, Marcel P., Reischl, Markus, Mikut, Ralf, Huang, Banban, Chien, Hsiang-Chin, Wang, Ching-Ping, Lee, Chia-Yen, Lin, Hong-Kun, Liu, Zaiyi, Pan, Xipeng, Han, Chu, Cheng, Jijun, Dawood, Muhammad, Deshpande, Srijay, Bashir, Raja Muhammad Saad, Shephard, Adam, Costa, Pedro, Nunes, João D., Campilho, Aurélio, Cardoso, Jaime S., S, Hrishikesh P, Puthussery, Densen, G, Devika R, C, Jiji V, Zhang, Ye, Fang, Zijie, Lin, Zhifan, Zhang, Yongbing, Lin, Chunhui, Zhang, Liukun, Mao, Lijian, Wu, Min, Vo, Vi Thi-Tuong, Kim, Soo-Hyung, Lee, Taebum, Kondo, Satoshi, Kasai, Satoshi, Dumbhare, Pranay, Phuse, Vedant, Dubey, Yash, Jamthikar, Ankush, Vuong, Trinh Thi Le, Kwak, Jin Tae, Ziaei, Dorsa, Jung, Hyun, Miao, Tianyi, Snead, David, Raza, Shan E Ahmed, Minhas, Fayyaz, Rajpoot, Nasir M.

arXiv.org Artificial Intelligence

Nuclear detection, segmentation and morphometric profiling are essential in helping us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of reproducible algorithms for cellular recognition with real-time result inspection on public leaderboards. We conducted an extensive post-challenge analysis based on the top-performing models using 1,658 whole-slide images of colon tissue. With around 700 million detected nuclei per model, associated features were used for dysplasia grading and survival analysis, where we demonstrated that the challenge's improvement over the previous state-of-the-art led to significant boosts in downstream performance. Our findings also suggest that eosinophils and neutrophils play an important role in the tumour microevironment. We release challenge models and WSI-level results to foster the development of further methods for biomarker discovery.


Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight BERT

Lu, Siyuan, Zhou, Chenchen, Xie, Keli, Lin, Jun, Wang, Zhongfeng

arXiv.org Artificial Intelligence

With the development of deep learning and Transformer-based pre-trained models like BERT, the accuracy of many NLP tasks has been dramatically improved. However, the large number of parameters and computations also pose challenges for their deployment. For instance, using BERT can improve the predictions in the financial sentiment analysis (FSA) task but slow it down, where speed and accuracy are equally important in terms of profits. To address these issues, we first propose an efficient and lightweight BERT (ELBERT) along with a novel confidence-window-based (CWB) early exit mechanism. Based on ELBERT, an innovative method to accelerate text processing on the GPU platform is developed, solving the difficult problem of making the early exit mechanism work more effectively with a large input batch size. Afterward, a fast and high-accuracy FSA system is built. Experimental results show that the proposed CWB early exit mechanism achieves significantly higher accuracy than existing early exit methods on BERT under the same computation cost. By using this acceleration method, our FSA system can boost the processing speed by nearly 40 times to over 1000 texts per second with sufficient accuracy, which is nearly twice as fast as FastBERT, thus providing a more powerful text processing capability for modern trading systems.


Image Translation Based Nuclei Segmentation for Immunohistochemistry Images

Trullo, Roger, Bui, Quoc-Anh, Tang, Qi, Olfati-Saber, Reza

arXiv.org Artificial Intelligence

Numerous deep learning based methods have been developed for nuclei segmentation for H&E images and have achieved close to human performance. However, direct application of such methods to another modality of images, such as Immunohistochemistry (IHC) images, may not achieve satisfactory performance. Thus, we developed a Generative Adversarial Network (GAN) based approach to translate an IHC image to an H&E image while preserving nuclei location and morphology and then apply pre-trained nuclei segmentation models to the virtual H&E image. We demonstrated that the proposed methods work better than several baseline methods including direct application of state of the art nuclei segmentation methods such as Cellpose and HoVer-Net, trained on H&E and a generative method, DeepLIIF, using two public IHC image datasets.


Remote AR/VR openings in Boston on August 06, 2022

#artificialintelligence

Role requiring'No experience data provided' months of experience in Los Angeles Qualifications: • 3 years experience in enterprise product management • 1 years experience managing immersive (AR/VR) products specifically • Bachelor's degree in HCI or engineering (or equivalent on-the-job experience) • 3 years experience in a dynamic product management role • Proven experience overseeing all elements of the product development lifecycle • Highly effective cross-functional team management • Previous experience delivering finely-tuned product marketing strategies • Exceptional writing and editing skills combined with strong presentation and public speaking skills • Outstanding portfolio and professional references • Proven track record of delivering complex products to difficult markets! BadVR is the world's first immersive data visualization and analytics platform. BadVR brings data into high-definition, making it easier to discover and identify hidden problems and opportunities, helping businesses make better decisions, faster. Based in Manhattan Beach, CA the rapidly-growing tech startup has attracted industry attention with its pioneering AR and VR demos, allowing people to – quite literally – 'step inside their data.' Our product is already empowering users across America through our work with Magic Leap, UNDP, National Science Foundation, and more. But–we are just getting started!


AI/ML, Data Science Jobs #hiring

#artificialintelligence

Johnson & Johnson (J&J) is an American multinational corporation founded in 1886 that develops medical devices, pharmaceuticals, and consumer packaged goods. Its common stock is a component of the Dow Jones Industrial Average and the company is ranked No. 36 on the 2021 Fortune 500 list of the largest United States corporations by total revenue.


Cellular Network Radio Propagation Modeling with Deep Convolutional Neural Networks

Zhang, Xin, Shu, Xiujun, Zhang, Bingwen, Ren, Jie, Zhou, Lizhou, Chen, Xin

arXiv.org Artificial Intelligence

Radio propagation modeling and prediction is fundamental for modern cellular network planning and optimization. Conventional radio propagation models fall into two categories. Empirical models, based on coarse statistics, are simple and computationally efficient, but are inaccurate due to oversimplification. Deterministic models, such as ray tracing based on physical laws of wave propagation, are more accurate and site specific. But they have higher computational complexity and are inflexible to utilize site information other than traditional global information system (GIS) maps. In this article we present a novel method to model radio propagation using deep convolutional neural networks and report significantly improved performance compared to conventional models. We also lay down the framework for data-driven modeling of radio propagation and enable future research to utilize rich and unconventional information of the site, e.g. satellite photos, to provide more accurate and flexible models.